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SECRETED EXPRESSED SEQUENCE TAGS (sESTs) 

5 FIELD OF THE INVENTION 

The present invention provides novel polynucleotides which are expressed sequence 
tags (ESTs) for secreted proteins. 

BACKGROUND OF THE INVENTION 

10 Gargantuan efforts have been employed by various investigational projects to 

randomly sequence portions of naturally-occurring cDNAs. The rationale behind this approach 
to identification and sequencing genes is founded in two basic principles: (1) that transcribed 
cDNAs represent the product of the most important genes, namely those that are actually 
expressed in vivo, and (2) that efforts to sequence genes and other portions of the genome of 

15 target organisms which are not actually expressed wastes substantial effort on areas not likely 
to yield genetic information of therapeutic importance. Thus, the high-throughput sequencing 
efforts focus on only those portions of the genome which are expressed. The randomly 
produced cDNA sequences represent "expressed sequence tags" or "ESTs", which identify and 
can be used as probes for the longer, full-length cDNA or genomic sequence from which they 

2 0 were transcribed. 

Although this "shortcut" approach to genomic sequencing presents savings of effort 
compared to sequencing of the complete genome, it still produced a vast array of ESTs which 
may not be directly useful as protein therapeutics. To date, the majority of protein-related drug 
discovery has focused on the use of secreted proteins to produce a desired therapeutic effect. 
25 Since the EST approach theoretically identifies all expressed proteins, it produces an EST 
library which contains a mixture of secreted proteins (such as hormones, cytokines and 
receptors) and non-secreted proteins (such as, for example, metabolic enzymes and cellular 
structural proteins), without identifying which ESTs correspond to proteins falling into either 
category. As a result, these methods are not optimally tailored to the needs of investigators 

3 0 searching for secreted proteins because they must separate the secreted "wheat" from the non- 

secreted "chaff', wasting effort and resources in the process. 

Co-assigned U.S. Patent No. 5,536,637, which is incorporated herein by reference, 
provides methods for focusing genomic sequencing efforts on sequences encoding the secreted 
proteins which are of most interest for identification of protein therapeutics. The '637 patent 
3 5 discloses a "signal sequence trap" which selectively identifies ESTs for secreted 
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proteins, namely "secreted expressed sequence lags" or "sESTs". It is to these sESTs that 
the present invention is directed. 

SUMMARY OF THE INVENTION 
5 The present invention provides for sESTs isolated from a variety of human 

RNA/cDNA sources. 

In preferred embodiments, the present invention provides an isolated polynucleotids 
comprising a nucleotide sequence selected from the group consisting of: 

SEQ ID NO: I , SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ 

10 ID NO:6. SEQ ID NO:7. SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:10, SEQ ID 

NO: 11, SEQ ID NO: 12, SEQ ID NO: 13. SEQ ID NO: 14, SEQ ID NO: 15, SEQ 
ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19. SEQ ID NO:20, SEQ 
ID NO:21 , SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24. SEQ ID NO:25. SEQ 
ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ ID NO:29, SEQ ID NO:30. SEQ 

15 ID NO:31 , SEQ ID NO:32, SEQ ID NO:33, SEQ ID NO:34, SEQ ID NO:35, SEQ 

ID NO:36, SEQ ID NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, SEQ 
ID NO:41 , SEQ ID NO:42, SEQ ID NO:43, SEQ ID NO:44, SEQ ID NO:45, SEQ 
ID NO:46, SEQ ID NO:47, SEQ ID NO:48, SEQ ID NO:49, SEQ ID NO:50, SEQ 
ID NO:51 , SEQ ID NO:52, SEQ ID NO:53, SEQ ID NO:54, SEQ ID NO:55, SEQ 

20 ID NO:56, SEQ ID NO:57, SEQ ID NO:58, SEQ ID NO:59, SEQ ID NO:60, SEQ 

ID NO:61 , SEQ ID NQ:62, SEQ ID NO:63, SEQ ID NO-.64, SEQ ID NO:65, SEQ 
ID NO:66, SEQ ID NO:67, SEQ ID NO:68, SEQ ID NO:69, SEQ ID NO:70, SEQ 
ID NO:71, SEQ ID NO:72, SEQ ID NO:73, SEQ ID NO:74, SEQ ID NO:75. SEQ 
ID NO:76, SEQ ID NO:77, SEQ ID NO:78, SEQ ID NO:79, SEQ ID NO:80. SEQ 

25 ID NO:81, SEQ ID NO:82, SEQ ID NO:83, SEQ ID NO:84. SEQ ID NO:85, SEQ 

ID NO:86, SEQ ID NO:87, SEQ ID NO:88, SEQ ID NO:89, SEQ ID NO:90, SEQ 
ID NO:91 , SEQ ID NO:92, SEQ ID NO:93, SEQ ID NO:94, SEQ ID NO:95, SEQ 
ID NO:96, SEQ ID NO:97, SEQ ID NO:98, SEQ ID NO:99, SEQ ID NO: 100, 
SEQ ID NO: 101, SEQ ID NO: 102, SEQ ID NO: 103, SEQ ID NO: 104, SEQ ID 

30 NO: 105, SEQ ID NO: 106, SEQ ID NO: 107, SEQ ID NO: 108, SEQ ID NO: 109, 

SEQ ID NO: 110, SEQ ID NO:lll, SEQ ID NO: 112, SEQ ID NO: 113, SEQ ID 
NO: 1 14, SEQ ID NO: 115, SEQ ID NO: 1 16, SEQ ID NO: 1 17, SEQ ID NO: 1 18, 
SEQ ID NO:119, SEQ ID NO:120, SEQ ID NO:12l, SEQ ID NO:122, SEQ ID 
NO: 123, SEQ ID NO: 124, SEQ ID NO: 125, SEQ ID NO: 126, SEQ ID NO: 127. 
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TGGCTCACTC AATGACCTCC AGTTCTTTAG ATACAACAGT AAAGACAGGA AGTCTCAGCC 240 

CATGGGACTC TGGAGACAGG TGGAAGGAAT GGAGGATTGG AAGCAGGACA GCCAACTTCA 300 

GAAGGCCAGG GAGGACATCT TTATGGAGAC CCTGAAAGAC ATCGTGGAGT ATTACAACGA 360 

CAGTAACGGG TCTCACGTAT TGCAGGGAAG GTTTGGTTGT GAGATCGAGA ATAACAGAAG 420 

CAGCGGAGCA TTCTGGAAAT ATTACTATGA TGGAAAGGAC AAACTCGAG " 469 

(2) INFORMATION FOR SEQ ID NO:484: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 516 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

< ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:484: 


GAATTCGGCC AAAGAGGCCT ACTACTTCTG TAGTCTCATC TTGAGTAAAA GAGAACCCAG 60 

CCAACTATGA AGTTCCTTGT CTTTGCCTTC ATCTTGGCTC TCATGGTTTC CATGATTGGA 120 

GCTGATTCAT CTGAAGAGTA TGGGTATGGC CCTTATCAGC CAGTTCCA GA ACAAC CACTA 180 

TACCCACAAC CATACCAACC ACAATACCAA CCTGCCTCAA GGTCCTCCAC CTCCTCCAGG 240 

AAAGCCACAA GGACCACCCC CACAAGGAGG CAACAAACCT CAAGGTCCCC CACCTCCAGG 300 

AAAGCCACAA CGACCACCCC CACAAGGAGG CAGCAAGTCC CGAAGTTCTC GATCTCCTCC 360 

AGGAAAGCCA CAAGGACCAC CCCCACAAGG AGGCAACAAA CCTCAAGGTC CCCCACCTCC 420 

AGGAAAGCCA CAAGGACCAC CCCCACAAGG AGGCAGCAAG TCCCGAAGTG CCCGATCTCC 480 

TCCAGGAAAG CCACAAGGAC CATCCCACAA CTCGAG 516 


(2) INFORMATION FOR SEQ ID NO: 4 85: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 357 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48 5: 


GAATTCGGCC AAAGAGGCCT ACTTCACTTC AGCTTCACTG ACTTCTTGAC TCTCCTCTTG 60 

AGTAAAAGGA CTCAGCCAAC TATGAAGTTT TTTGTCTTTG CTTTAGTCTT GGCTCTCATG 120 

ATTTCCATGA TTAGOGCTGA TTCACATGAA AAGAGACATC ATGGGTATAG AAGAAAATTC 180 

CATGAAAAGC ATCATTCACA TCGAGAATTT CCATTTTATG GGGACTGTGG ATCAAATTAT 240 

CTATATGACA ATTGATATCC TTAGTAATCA TGGGGCATGA TTATAGAGGT TTGACTGGCA 300 

AATTCACTTT TACTCATTTA TTCTCATTCA TCACACCGCA AGTCTAGGCC T CTCGAG 357 

(2) INFORMATION FOR SEQ ID NO:486: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 643 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:486: 
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